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Aerosols are tiny particles suspended in the air that can be made up of sand, dust, smoke, 
smog, and so on. Over the glove, aerosols have a major impact on the air quality, 
hydrological cycle, and climate; still, the scale of this impact is still poorly understood. 
To fill this gap in our knowledge, global and local properties of atmospheric aerosols 
have been extensively observed and measured using both satellite and ground-based 
instruments, especially during the last decade. Aerosol properties retrieved by the 
different instruments contribute to an unprecedented availability of the most complete set 
of complimentary aerosol measurements ever acquired. However, some of these 
measurements remain underutilized, largely due to the complexities involved in 
analyzing them synergistically. 

To characterize the inconsistencies and bridge the gap that exists between the sensors, we 
have established a Multi-sensor Aerosol Products Sampling System (MAPSS), which 
consistently and uniformly samples aerosol products from multiple spaceborne sensors, 
including MODIS (on Terra and Aqua), MISR, OMI, POLDER, CALIOP, and SeaWiFS. 
Samples of satellite aerosol products are extracted over Aerosol Robotic Network 
(AERONET) locations as well as over other locations of interest such as those with 
available ground-based aerosol observations. 

In this way, MAPSS enables aerosol scientists across the world to compare and integrate 
data between aerosol observations from multiple sensors, enhancing our understanding of 
aerosols, and improving the quality of the aerosol measurements. In our paper, we 
explain the sampling methodology and concepts used in MAPSS, and demonstrate 
specific examples of using MAPSS for an integrated analysis of multiple aerosol 
products. 
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9 Abstract 

10 Global and local properties of atmospheric aerosols have been extensively observed and 

1 1 measured using both spacebome and ground-based instruments, especially during the last 

12 decade. Unique properties retrieved by the different instruments contribute to an 

13 unprecedented availability of the most complete set of complimentary aerosol measurements 

14 ever acquired. However, some of these measurements remain underutilized, largely due to the 

15 complexities involved in analyzing them synergistically. To characterize the inconsistencies 

16 and bridge the gap that exists between the sensors, we have established a Multi-sensor 

17 Aerosol Products Sampling System (MAPSS), which consistently samples and generates the 

1 8 spatial statistics (mean, standard deviation, direction and rate of spatial variation, and spatial 

19 con-elation coefficient) of aerosol products from multiple spacebome sensors, including 

20 MODIS (on Terra and Aqua), MISR, OMI, POLDER, CALIOP, and SeaWiFS. Samples of 

2 1 satellite aerosol products are extracted over Aerosol Robotic Network (AERONET) locations 

22 as well as over other locations of interest such as those with available ground-based aerosol 

23 observations. In this way, MAPSS enables a direct cross-characterization and data integration 

24 between Level-2 aerosol observations from multiple sensors. In addition, the available well- 

25 characterized co-located ground-based data provides the basis for the integrated validation of 

26 these products. This paper explains the sampling methodology and concepts used in MAPSS, 

27 and demonstrates specific examples of using MAPSS for an integrated analysis of multiple 

28 aerosol products. 

29 


1 



1 


1 Introduction 


2 Atmospheric aerosol parameters are routinely retrieved and archived for public access by an 

3 array of ground-based and spacebome sensors, especially since the 1990s (Holben et al. 1992, 

4 1998; Herman et al., 1997; Husar et al., 1997; Deuze et al., 1999; Mishchenko et al., 1999; 

5 Ignatov and Stowe 2002; Chu et al., 2002; Remer et al., 2002; Hsu et al., 2004; Ichoku et al., 

6 2005; Torres et al., 2007; Winker et al., 2007; Levy et al., 2009). However, the integrated use 

7 of these observations is greatly complicated by the numerous discrepancies and differences 

8 that exist between the sensors and their aerosol products, including dissimilar spatial and 

9 temporal resolutions, archival strategies, approaches to quality control, and so forth (Kahn et 

10 al., 2007; Liu and Mishchenko 2008; Li et al., 2009). The problem is further complicated by 

1 1 the differences in the algorithms, underlying assumptions, and uncertainties involved in 

12 creating the products. 

13 The purpose of this paper is to introduce the Multi-sensor Aerosol Products Sampling System 

14 (MAPSS), which is a framework designed to provide a uniform and consistent sampling of 

15 aerosol products from multiple sources. MAPSS was originally designed to support the 

16 validation effort for the aerosol retrieval algorithm from the Moderate-resolution Imaging 

17 Spectro-radiometer (MODIS) sensor aboard the Terra and Aqua satellites (Ichoku et al., 

18 2002), but has been redesigned with the aim of facilitating an integrated use and detailed 

19 comparative analysis of aerosol measurements from multiple satellite sensors. Currently, the 

20 supported sensors include MODIS on Terra and Aqua, the Multi-angle Imaging Spectro- 

21 Radiometer (MISR) on Terra, the Ozone Monitoring Instrument (OMI) on Aura, the 

22 POLarization and Directionality of the Earth's Reflectances (POLDER) on Parasol and its 

23 heritage ADEOS and ADEOS-2 satellites, the Cloud-Aerosol Lidar with Orthogonal 

24 Polarization (CALIOP) on Calipso, as well as aerosol retrievals using the Deep Blue 

25 algorithm from the Sea-viewing Wide Field-of-view Sensor (SeaWiFS) aboard the SeaStar 

26 spacecraft. Like the original MAPSS, this multi-sensor version is also based on the 

27 collocation of the satellite data products over the global AERosol Robotic NETwork 

28 (AERONET) of ground-based sun-photometer stations and over other important sites. 

29 The relevant characteristics of the aerosol data products from the different sensors are 

30 described in Sect. 2, while the details of the MAPSS sampling concepts are explained in Sect. 

31 3. The implementation of MAPSS on the Web, online data access and analysis approaches, 
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1 and user tips are described in Sect. 4. Section 5 discusses possible applications of the 

2 proposed system, followed by conclusions in Sect. 6. 

3 

4 2 Supported aerosol products 

5 This work focuses on aerosol observations from multiple ground-based and spacebome 

6 instruments; the supported instruments, products, and some of their key characteristics are 

7 outlined in Table 1. It is pertinent to note that all satellite data products supported in MAPSS 

8 are derived directly from the retrieval level aerosol products (Level 2); the Level 2 data 

9 represents the highest available spatial resolution for each product/sensor combination and is 

10 free of aggregation artifacts that can be present in data at Level 3 (Levy et ah, 2009; Zhang 

1 1 and Reid, 2010; Hyer et ah, 2011). The remainder of this section provides a brief description 

12 of each of the products supported in MAPSS, while highlighting unique aerosol properties 

1 3 reported in these products. 

14 AERONET (http://aeronet.gsfc.nasa.gov ) sun-photometers measure aerosol properties using 

15 ground-based observations of solar direct and diffuse radiances in three observation 

16 configurations (direct solar, principal plane, and almucantar), based upon which they provide 

17 three distinct aerosol product categories. The first is the aerosol optical depth or thickness 

1 8 (AOD or AOT) product, which is obtained from the AERONET direct measurements of solar 

19 irradiance. The second is the spectral deconvolution aerosol product (SDA), which is a more 

20 advanced product that uses a spectral deconvolution algorithm to derive additional aerosol 

21 properties, not available by means of the direct retrievals. Finally, the third is the inversion 

22 aerosol product (INV), which is an aerosol product that uses an inversion algorithm that, 

23 based on a limited set of measured aerosol properties, estimates possible values for other 

24 properties. Additionally, for the convenience of aerosol data inter-comparison and validation, 

25 MAPSS provides an auxiliary dataset with AERONET AOD interpolated to the common 

26 wavelengths used in the spacebome retrievals based on the established wavelength 

27 dependence of AOD (Eck et al., 1999). Elowever, it is important to note that the interpolated 

28 dataset is not quality-assured by the AERONET team and may contain inaccuracies that are 

29 inherent in the process of interpolation. 

30 The MODIS ( http://modis.gsfc.nasa.gov ) aerosol product (MOD04 and MYD04) comprises 

3 1 the ambient aerosol optical thickness and other physical properties of aerosols retrieved 

32 globally over land and ocean. It should be noted that the dataset name prefix ‘MOD’ signifies 
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1 the MODIS instrument onboard the Terra satellite, while ‘MYD’ indicates the MODIS 

2 instrument onboard the Aqua satellite. All the MODIS aerosol products currently included in 

3 MAPSS are those retrieved at 10-km nominal resolution at nadir. In addition to the aerosol 

4 product, MAPSS also includes the MODIS precipitable water vapor product designated by 

5 MOD05 and MYD05 for Terra and Aqua, respectively. The MODIS precipitable water vapor 

6 product is of two kinds, one based on infrared (IR) retrieval at 1-km nominal spatial 

7 resolution, and the other from near-infrared retrieval at 5 -km nominal spatial resolution. 

8 The MISR ( http://www-misr.ipl.nasa.gov) aerosol product (MIL2ASAE) features aerosol 

9 retrievals based on observations from 9 independent camera angles. Multiple viewing angles 

10 allow MISR to measure certain aerosol properties that are not available from the other 

11 instruments (e.g., aerosol particle size). Furthermore, MISR multiple cameras enable 

12 retrievals under conditions that are unfavorable to single-view (e.g. nadir) instruments, such 

13 as over bright surfaces or sun glint, where the other instruments are unable to make reliable 

14 retrievals in the visible wavelengths. 

15 The OMI ( http://www.knmi.nl/omi/research/instrument/index.php ) aerosol product 

16 (OMAERUV) measures the near-UV (near ultraviolet) aerosol absorption and extinction 

17 optical depth, as well as single scattering albedo, among other aerosol properties. Moreover, 

18 OMI is capable of retrieving absorption optical depth in partially cloudy conditions that 

1 9 usually pose a challenge to other aerosol instruments. 

20 The POLDER (http://www.icare.univ-lillel.fr/parasol ) aerosol land product (PxL2TLGC) and 

21 aerosol ocean product (PxL2TOGC) are derived from measuring spectral, directional, and 

22 polarized properties of reflected solar radiation. One of the main features of the POLDER 

23 instrument is its utilization of polarization properties of the measured radiation for retrieving 

24 anthropogenic aerosol optical depth. 

25 The CALIOP (http://www-calipso.larc.nasa.gov ) aerosol product (05kmALay) represents 

26 atmospheric curtain slices portraying the vertical distribution of aerosols and clouds in the 

27 atmosphere, including the density and certain properties of individual aerosol layers. 

28 The SeaWiFS (http://disc.sci.gsfc.nasa.gov/dust/) aerosol product (SWDB) uses the Deep 

29 Blue (FIsu et ah, 2004) algorithm to derive aerosol optical thickness and Angstrom exponent. 

30 The key features of this product are the retrievals of aerosol properties over both bright desert 

3 1 and vegetated surfaces, and a highly precise calibration of the SeaWiFS sensor. 
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1 Since each of the foregoing data sets has a few versions because of the periodic revisions and 

2 updates of their retrieval algorithms over time, the data versions that were current at the time 

3 of MAPS S development were processed. Collection 051 was processed for both Terra and 

4 Aqua MODIS, Version 0022 for MISR, Version 003 for OMI, Version 3-01 for CALIOP, 

5 Version K for POLDER, Version 002 for SeaWiFS (preliminary version), Version 2 for the 

6 AERONET AOD and INV products, and Version 4.1 for the AERONET SDA product. 

7 Therefore, unless otherwise specified, all of the illustrations and analyses shown in this paper 

8 are based on these data versions for the respective satellite sensors, and Version 2 for 

9 AERONET. 

10 

11 3 Data Sampling and Analysis 

12 The proposed framework expands on the concepts of the sampling approach that was 

13 developed by Ichoku et al. (2002), and used for validation and analysis of MODIS aerosol 

14 products using AERONET measurements (Remer et al., 2002; Chu et ah, 2002; Ichoku et al., 

15 2003, 2005). In the original approach, the spatial aerosol measurements acquired aboard the 

16 Terra and Aqua spacecraft were sampled within 50x50 km areas, centered over AERONET 

17 sun photometer measurement sites, as well as over certain other point locations where the 

18 satellite aerosol data samples are required. The pixel containing the ground station was 

19 determined by finding a pixel with the minimal Euclidian distance between the 

20 longitude/latitude coordinates of the center of this pixel and those of the ground station. Next, 

21 the extent of the sampling area was detennined by finding surrounding pixels located no more 

22 than 25 km from the central pixel, based on the Euclidean distance between their longitude 

23 and latitude coordinates. In turn, temporal measurements from each ground-based location 

24 were sampled at each satellite overpass time in 1-hour sampling segments (i.e., 30 minutes 

25 before and 30 minutes after the overpass). 

26 The Multi-sensor Aerosol Products Sampling System (MAPSS) described in this paper uses a 

27 similar sampling approach and includes data sampling from a wider variety of spacebome 

28 aerosol sensors, as outlined in Table 1. To accommodate the various resolutions of the 

29 extended set of the supported aerosol products and to make their ground sampled areas as 

30 equivalent as possible, instead of sampling within the nominal 50x50-km square used for 

31 MODIS in the original MAPSS, the multi-sensor data sampling space is now defined by a 

32 circle of approximately 50-km diameter that is centered on the ground (e.g., AERONET) 
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1 measurement station (see Figure 1), However, while testing the 50-km diameter sampling, it 

2 was found that, because of the wide variety of pixel shapes and sizes among the satellite 

3 products, some of the coarser-resolution products were not sampled in a balanced way around 

4 the ground site, even at nadir. For instance, if POLDER data (nominal resolution: 19x19 km 

5 at nadir) were sampled within a 50-km diameter circle, with the ground point located slightly 

6 off the center of a pixel, only the neighbors of the central pixel on the same side as the ground 

7 point would be sampled, while those on the opposite side would be omitted. Therefore, based 

8 on empirical analysis of the different data products, it was found that a diameter of 55 km 

9 would enable overall balanced sampling within the circular sample space for the different data 

10 products, at least near nadir. In effect, a pixel is sampled only if the distance between its 

1 1 center and the ground station does not exceed 27.5 km, calculated from the longitude/latitude 

12 coordinates of the ground station and the pixel’s center using the Haversine distance formula 

13 (Sinnott 1984). The actual number of possible pixels within this 55-km diameter sample space 

1 4 depends on the pixel shape and size for each sensor. Also, for a given sensor, since the pixel 

15 size typically increases in size away from nadir, the maximum number of pixels within the 

1 6 sample space decreases with the distance of the ground station away from the nadir of the 

17 satellite scene. Given their respective nominal spatial resolutions (see Table 1), the maximum 

18 number of pixels within the 55-km diameter sample space at nadir for the different sensors are 

19 as follows: MODIS - 25, MISR - 9, OMI - 8, POLDER - 9, CALIOP - 1 1, and SeaWiFS - 16. 

20 To determine the effects (if any) of using the circular (instead of square) sample space on the 

21 sample statistics, means of Terra MODIS Collection 5 AOD derived from both sample spaces 

22 (i.e. 50 x 50-km square and 55-km diameter circle) were calculated and plotted against 

23 corresponding statistics from AERONET for comparison (Figure 2). The results indicate that 

24 the difference in the shape of the sampling space has only a small effect on the derived 

25 sample statistics of the data. However, it was found that the circle-based sampling produced 

26 approximately 22% fewer data points: 2,394,624 data points were generated for the studied 

27 10-year period using the square-based sampling, while only 1,881,858 data points were 

28 produced using the circle-based sampling. This difference can be explained by recalling that, 

29 when the ground station is located off-nadir of the sensor or off-center of the central pixel in 

30 the sample space, in order to maintain a uniform sampling area, the number of pixels in a 

31 circle-based sample can be much reduced even down to 1, depending on the sensor 

32 observation geometry and how far the ground station is located off-nadir. This can result in a 

33 null subset, if none of the reduced number of sample points contains aerosol retrieval. In 
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1 contrast, the number of pixels in the square-based samples remains roughly constant 

2 regardless of the retrieval geometry conditions, resulting in the larger number of data points. 

3 Additionally, the further inspection of the data revealed that, while the number of the data 

4 points decreased uniformly over most of the sampling locations, it increased in certain 

5 locations, including islands, coastal areas, and locations in high latitudes. This difference can 

6 be attributed to the greater accuracy of the Haversine formula used in the new circle-based 

7 sampling compared to the Euclidean distance approach that was used previously. In the 

8 MAPSS application, the Haversine formula is accurate within approximately 200 meters, 

9 which allows a high precision for improving sampling in the above-mentioned areas, where 

10 aerosol retrievals are scarcer. 

11 3.1 Statistics of Sampled Data 

12 Data products with very different spatial resolutions have been sampled within the uniform 

13 55-km diameter circular sample space in order to render them as comparable as possible using 

14 appropriate analysis tools. As such, the data subsets sampled over each ground station during 

15 each satellite overpass time are used to calculate a number of appropriate statistics, including 

16 their mean, median, mode, and standard deviation. Corresponding statistics are also calculated 

17 from the temporal subsets of the ground-based measurements. In either case, only data points 

18 with valid values of each measured parameter are used to calculate the statistics. The ranges 

19 of valid data values are established a-priori based on the “valid_range” pair of values 

20 specified in the metadata for some of the data products. Where such valid range is not 

21 specified in the data, a quantitatively reasonable valid range is assumed. For instance, 0° - 

22 180° is set as the valid range for ‘solar zenith angle’ because this is the known normal range 

23 for this type of zenith-to-nadir angular parameter. By specifying the valid ranges for different 

24 parameters, fill values (usually large negative numbers like -9999) are used in the different 

25 data sets to indicate non-retrieval or other data gaps, as well as other spurious values that may 

26 occur in any of the data products, which are automatically excluded from the statistics 

27 calculations. No other filtering is applied to the data before the statistics are calculated. The 

28 statistics include: 

29 • ndat - total number of data points sampled (e.g., number of pixels contained within 

30 the 55-km sample space in a satellite dataset); 
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• nval - number of sampled data points with valid data for use in the statistics 
calculation; 

• cval - value of a central sampling data point in the subset. For the spaceborne data, 
this is a point in the sample that has the smallest distance to the ground station. For the 
ground-based data, this is a point in the sample that is the closest in time to the 
oveipass of the satellite; 

• mean - mean of the subset; 

• medn - median of the subset; 

• mode - mode of the subset; 

• sdev - sample standard deviation of the subset. (Note: when nval<l, this value is 
undefined as opposed to zero). 

It is important to note that mean, median, and sdev are computed only for continuous 
datasets (i.e., datasets comprising real numbers). On the other hand, mode is computed only 
for discrete datasets (i.e., datasets comprising only integer numbers, for example, data quality 
flags). 

It is pertinent to note that, since the ground-based AERONET measurements are sampled and 
recorded in the MAPSS archive for every satellite overpass, in cases where multiple satellites 
pass over a particular location within a 1-hour timeframe (e.g., satellites in the A-Train 
formation), a single AERONET measurement can be sampled and recorded in the archive 
multiple times. It is recommended to account for this duplication when further aggregation of 
the AERONET data in the MAPSS archive is performed, in order to avoid possible 
oversampling issues. 


23 3.2 Characterizing the spatio-temporal variability of the data 

24 The sampling of the satellite aerosol data beyond the pixels lying directly over the ground 

25 stations is intended to provide not only the average values of the measured parameters, but 

26 also their local variability and other characteristics (over the 55-km sample space around the 

27 station) that can enhance detailed scientific research and validation. Therefore, in addition to 

28 computing the basic statistical parameters, a linear multiple regression plane is fitted to each 

29 continuous spatial subset, and a linear regression line to each continuous temporal subset (Fox 

30 1997; Ichoku et al., 2002). Based on this fit, the following additional statistics are computed: 

3 1 • slop - slope of the fitted plane or line; 
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1 • slaz - azimuth (direction) of the slope of the plane; 

2 • mcoc / Icoc - multiple correlation coefficient (for a spatial subset regressed on the 

3 lat/lon coordinates of the sample points) or linear correlation coefficient (for a 

4 temporal subset regressed on the measurement times of the ground-based data samples 

5 such as those of AERONET). 

6 These (slop, slaz, and mcoc/lcoc) statistics are not computed unless the number of sampled 

7 valid data points is sufficient to obtain a statistically robust fit for the plane or line. To ensure 

8 that this condition is met, the shape, size and maximum number of pixels that fall within the 

9 nadir sampling area of each sensor were carefully examined. Based on this empirical analysis, 

10 the minimal required number of valid data points to fit a plane was determined for each sensor 

1 1 so that these points do not form a degenerate plane; in other words, the data points in a subset 

12 should not all fall on the same line. In this way, the minimal numbers of valid data points 

13 required to fit a plane were set for the different aerosol products as: MODIS - 10, MISR - 5, 

14 OMI - 4, POLDER - 5, SeaWiFS - 7. Likewise, the minimal number of valid data points 

15 required to fit a line for AERONET and CALIOP aerosol products was set to 2, as this is the 

16 minimal number of points that can be used to uniquely define a line. 

17 As shown in Figure 3 and Figure 4, these statistics can be used to assess the local spatio- 

18 temporal distribution and variation of the samples. In particular, the azimuth (i.e., slaz) 

19 parameter indicates the direction of the gradient of an aerosol parameter under consideration, 

20 pointing toward the lower values of the parameter. For instance, if AOD is the parameter of 

21 interest, slaz would typically indicate the direction of decreasing aerosol density, which 

22 would generally correspond with the direction of wind flow and plume dispersion from the 

23 aerosol source. Even when a sampling area contains multiple aerosol plumes, slaz for AOD 

24 can still point in the direction from the optically thickest to the optically thinnest plume, as 

25 demonstrated in Figure 4. In such cases, in addition to slaz, it is helpful to consider the spatial 

26 slope (slop) and multiple correlation coefficient (mcoc) also, as lower values of these spatial 

27 statistics would indicate a single homogeneous plume, while higher values might indicate 

28 multiple plumes, or a strong aerosol source present in the 55-km-diameter sample space. 

29 3.3 Quality assurance (QA) data 

30 The Level-2 aerosol products from all of the satellite sensors supported in MAPSS include 

31 quality assurance / quality control (QA) flags that indicate the “trustworthiness” of individual 
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1 pixels in these datasets. For MODIS and SeaWiFS, aerosol QA flags are integer numbers 

2 ranging from 0 to 3, with 3 representing the highest quality. For the MISR and OMI data, the 

3 reverse is the case (i.e., 0 is the highest quality). Finally, for POLDER and CALIOP, QA data 

4 are a combination of one or more flags, most of which are real numbers ranging between 0 

5 and 1, where 1 indicates the highest quality. Table 2 provides a short summary of the 

6 discussed QA flags, while for detailed guidelines on the usage of these flags, depending on a 

7 particular science application, it is advised to consult the up-to-date science team 

8 recommendations for the analyzed products. 

9 The QA values are set by the product retrieval algorithms and can be used to screen the data 

10 of pixels with potentially uncertain or erroneous retrievals or even invalid values. To facilitate 

11 such screening, MAPSS extracts the QA flags over the sampling area and computes the 

12 statistical mode for integer QA flags and mean for real QA flags. These statistical modes of 

1 3 the integer Q A flags and means of the real Q A flags provide an equivalent of a single number 

14 quality assessment for each sample, and can be used to screen the corresponding subset 

15 statistics, as demonstrated in Figure 5. 

16 To test whether the approach of screening the already computed statistics of the aerosol 

17 parameters based on the statistically aggregated values of the QA flags (Ichoku et al., 2002, 

18 2003) has the potential of being less effective than screening individual pixels using their 

19 respective QA flags before computing the sample statistics discussed above, the two 

20 approaches were compared as outlined in Table 3 and Table 4. It was found that the screening 

21 of the aerosol parameter subset statistics by their QA mode produces results that are similar to 

22 the screening of individual pixels by their QA before computing the statistics, although the 

23 former method results in slightly fewer data points, since entire data points are rejected based 

24 on the average QA value even if some of its component individual pixels have good QA flags. 

25 However, this is a small trade-off compared to the increased amount of effort involved in pre- 

26 screening before statistics. 

27 

28 4 Data Management and Accessibility 

29 The subset statistics of hundreds of aerosol parameters and ancillary data sampled from daily 

30 measurements of six satellite sensors, over hundreds of ground-based stations, constitute an 

3 1 enormous amount of data, whose data management and accessibility requirements are non- 

32 trivial. Therefore, special tools and resources were developed to handle data batch processing, 

10 



1 storage, updates, and access, as seamlessly as possible. Data management is handled through 

2 a custom-designed database, while data access is through a Web interface. 

3 4.1 Batch Processing 

4 An automated software system has been developed to perform the multi-sensor aerosol 

5 sampling and statistical analysis in batch mode at pre-specified times. The MAPSS system 

6 interrogates the list of ground sites to determine the current list of sites over which data 

7 sampling is to be performed. It then fetches the aerosol data products from the online data 

8 sources of the supported satellite sensors to extract data subsets and derive their relevant 

9 statistics, as described in the previous section. Similarly, available AERONET data are 

10 obtained and sampled to derive corresponding temporal statistics. These analyses are 

1 1 performed on a daily basis, and the derived subset statistics are archived in simple comma- 

12 separated text (CSV) files that are easily accessible online (http://modis- 

13 atmos.gsfc.nasa.gov/MAPSS/ ; as of August 2011, this archive contained over 1,420,000 CSV 

14 files). However, in some cases, there may be a delay of several days between the time the data 

15 becomes available in a particular sensor’s data repository and the time the extracted subset 

16 statistics become accessible in MAPSS. This delay is associated with the time required for 

1 7 data retrieval and processing. Also, there are likely to be some data gaps resulting from the 

18 inability of the different sensor algorithms to retrieve certain aerosol parameters due to 

19 unfavorable conditions, such as certain types of surface characteristics, clouds in reflective 

20 bands, sun glint over ocean, or even data downtime for sensor calibrations. 

21 4.2 The MAPSS Database and Data Structure 

22 The multi-sensor aerosol data analysis requires exploration of how to organize such a diverse 

23 set of data in a coherent, user-friendly, manner to facilitate access and analysis. While a basic 

24 examination of the data can be effectively performed using the described comma-separated 

25 text (CSV) file archive, the sheer amount of the sampled data and the limitations of the text 

26 file access make a more sophisticated analysis less practical. Therefore, it was necessary to 

27 establish a relational database that would streamline the access and the querying of the data. 

28 For this purpose, a dedicated PostgreSQL (http://www.postgresql.org ) database was created, 

29 with the data structure designed to precisely reflect the logical organization of the sampled 

30 data. 
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1 The database is organized as a collection of individual data records, where each data record 

2 stores a sampled set of measurements that are acquired by a specific sensor at a specific 

3 location and at a specific time, and a corresponding list of ancillary data, including the 

4 measurement geometry (e.g., solar zenith at the time of the measurement, sensor azimuth, 

5 etc.) and QA information. Additionally, each data record is associated with data provenance 

6 information such as the name of the aerosol data file sampled, and the location of the central 

7 sampling point (cval) in this file. This information provides for quick access to the original 

8 data and allows for more detailed exploration of data points of interest. 

9 Therefore, each data record consists of the statistics that are computed for one or more aerosol 

10 parameters retrieved by a specific aerosol sensor, as described in Sections 3.1 and 3.2. For 

1 1 example, AOD, fine mode fraction, reflectance, and other aerosol parameters in the Aqua 

12 M.ODI.S aerosol product (MYD04), sampled at 18:08 UTC over the GSFC site on 2010-07- 

13 06, comprise a single data record, whereas mean, sdev, mcoc, and other statistics computed 

14 from the data sample for AOD at 550nm constitute a single statistics record. 

15 The database is routinely populated with the most current information from the MAPSS CSV 

16 files. In August 2011, the database contained 28,818,432 data records and 1,613,178,051 

17 statistics records. This size, however, poses a set of database maintenance challenges, where 

18 the operations necessary to keep the database consistent and up-to-date require a substantial 

19 computation time. Therefore, the information in the database can be less current than the 

20 information in the CSV archive. 

21 It is expected that as new product versions become available, they will also be extracted and 

22 organized in the MAPSS database, such that it would also be possible, if desired, to access 

23 and compare different versions of aerosol retrievals from the same sensor. 

24 4.3 Web MAPSS 

25 A special point-based data analysis system was created within the framework of the Giovanni 

26 system (Acker and Leptoukh 2007; Berrick et ah, 2009) to provide for a simple and 

27 customized Web-based access to the data archived in the MAPSS database 

28 ( http://giovanni.gsfc.nasa.gov/mapss/) . A screenshot of this so-called Web MAPSS data- 

29 access interface is shown in Figure 6. Through this interface, it is possible to select desired 

30 sampling locations and time period, aerosol products and associated data sets, as well as 

3 1 statistical variables of interest and their desired range of data quality (QA). Based on these 
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] criteria, Web MAPSS selects appropriate MAPSS data samples, filters them according to the 

2 specified QA values, aligns the data in time and space, and produces a graphic plot of the 

3 data. The selected data sets are also formatted and staged for download as a comma-separated 

4 CSV text file. In this way, Web MAPSS allows the user to quickly and intuitively assess a 

5 combination of aerosol properties from multiple satellite sensors collocated with AERONET 

6 ground-based measurements over one or more ground locations, without having to download 

7 vast amounts of data from disparate product archives. In addition, MAPSS provides 

8 documentation explaining the basic attributes of each sampled product, complete with links to 

9 the relevant algorithm theoretical basis documents (ATBD) of the original products, thereby 

10 saving the time needed for locating these documents and facilitating an exploration of 

1 1 unfamiliar aerosol products. 

12 

13 5 Applications 

14 Several possible applications of the data sampled by the MAPSS system have been 

15 envisioned. For example, the sampled data can be used for comparing spacebome 

16 observations with corresponding ground-based measurements. Based on such comparison, it 

17 would be possible to assess the accuracy of aerosol retrievals from multiple spacebome 

18 instruments in a manner similar to the validation studies of the MODIS aerosol products 

19 (Remer et al., 2002, 2005, 2008; Chu et ah, 2002; Ichoku et ah, 2003, 2005; Levy et ah, 

20 2010 ). 

21 Simultaneous comparison of aerosol retrieval accuracy from multiple satellite sensors can 

22 help investigate the intrinsic strengths and weakness of the different instruments for aerosol 

23 remote sensing over different regions of the globe. For instance, appropriate comparative 

24 analysis results could provide indications of which sensors are particularly suitable for 

25 analysis of aerosols in a given region of the globe, as well as to explore the peculiarities of 

26 aerosol retrievals from a particular instrument over this region. Figure 8 shows an example, 

27 where measurements of aerosol optical depth (AOD) from multiple spacebome sensors over 

28 two locations, (a) Evora (Portugal) and (b) Bondvitte (USA), are compared to the 

29 corresponding interpolated measurements from AERONET. The spacebome sensors and 

30 retrieval algorithms used in the comparison include Terra MODIS Dark Target - land (TMOD 

31 DT), Terra MODIS Deep Blue - land (TMOD DB), Aqua MODIS Dark Target - land 

32 (AMOD DT), Aqua MODIS Deep Blue - land (AMOD DB), MISR, OMI, and CALIOP. A 
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1 comparison of the regression equations and squared correlation coefficients from the different 

2 sensors over the two locations highlight differences that exist in aerosol retrievals from 

3 multiple sensors. 

4 Also, the spatial statistics collected by MAPSS can provide an insight into the spatial 

5 consistency of spaceborne retrievals. For example. Figure 9 shows linear regression fits of (a) 

6 mean, and (b) central values (i.e., cval) of AOD retrievals from the same set of satellite 

7 sensors against the corresponding interpolated AERONET data. The mean and cval plots 

8 show fairly similar fits, with the MODIS cval fits being slightly better, indicating that 

9 spaceborne MODIS AOD retrievals directly over Dakar (Senegal) are more accurate than the 

10 overall retrievals in the area surrounding Dakar. This could probably be explained by 

1 1 difficulties associated with retrieving aerosol properties over complex mixed environments 

12 due to their inherent surface inhomogeneities, particularly given the tendency for the 

13 occurrence of complex mix of marine aerosols, Saharan dust, urban pollution, and smoke 

1 4 from biomass burning in that region. 

15 Another synergistic use of the collocated satellite and AERONET data is that the unique 

16 retrievals of aerosol layer heights provided by Calipso-CALIOP can be used to evaluate the 

17 degree of uncertainty due to the occurrence of multiple aerosol layers or variation in the layer 

1 8 heights of single or multiple layers in the data produced by the other sensors. In the example 

19 shown in Figure 10, AOD differences between ground-based observations from AERONET 

20 and spaceborne observations from both POLDER (ocean) and OMI over Dakar are higher in 

21 the presence of multiple aerosol layers, based on layer retrievals from Calipso-CALIOP. 

22 Finally, the MAPSS database can be used as a data source for other aerosol investigation 

23 projects. For example, the AeroStat system ( http ://gio vanni. gsfc.nasa. gov/ aerostat/ ) uses the 

24 MAPSS database for assessing systematic biases that can possibly exist in aerosol 

25 measurements retrieved by different sensors. 

26 

27 Conclusions 

28 The Multi-sensor Aerosol Products Sampling System (MAPSS) provides a consistent 

29 sampling approach that enables easy and direct inter-comparison and validation of the diverse 

30 aerosol products from different satellite sensors in a uniform and consistent way. The range of 

31 statistics collected in MAPSS facilitates the investigation of various spatio-temporal 
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1 properties of aerosols, as observed from multiple sensors with complementary capabilities, 

2 thereby helping to expand our understanding of the distribution and environmental impact of 

3 aerosols from different perspectives at local scales, with the possibility of extension by 

4 aggregation to global scales. Indeed, the readily available unified access to distinct aerosol 

5 parameters from multiple sensors provides a platform for acquiring a more complete 

6 understanding of the inter-relationships that may exist between the different physical 

7 properties of aerosols, which cannot all be measured from one or even a few sensors. It is 

8 expected that the MAPSS system will open the way for a multitude of synergistic aerosol 

9 studies, some of which have probably not been considered till date. 

10 
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1 Table 1. Atmospheric aerosol measurement instruments and products supported in MAPSS. 

2 The indicated equator crossing times are based on the original orbital designs, and can change 

3 during the lifetimes of the satellites 

4 


Sensor 

Platform 

Product 

Spatial 

Resolution 

Equator 

crossing 

time 

Data period 

AERONET 

N/A 

AOT, SDA, INV 

N/A 

N/A 

Varies with sites 

MODIS 

1 

MOD04, MOD05 

10x10 km 

H 

Jan ’00- 


MYD04, MYD05 


Jul’02- 

MiSR 

Terra 

M1L2ASAE 

17.6x1 7.6 km 

10:30 am 

Jan ’00- 

OMI 

Aura 

OMAERUV 

13.7x23.7 km 


Oct’04- 

POLDER 

ADEOS 

ADEOS-2 

PARASOL 

liillSi 

1 9x 1 9 km 

1:30 pm 

Oct’96-Jun’97 

Apr’03-0ct’03 

Mar’05- 

CALIOP 

CALIPSO 

SHESSSESITMI 

5x5 km 



SeaWiFS 

SeaStar 

SWDB 

13.5x13.5 km 


Jan’98-Dec’10 
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1 Table 2. Summary of the quality assurance / quality control (QA) flags and their values for 

2 the aerosol products supported in MAPSS. 

3 


Products 

QA Flag 

(name in MAPSS} 

Usage notes 

Values 

MOD04 

MYD04 

QA-1 

Land datasets 

0=No confidence 
1 =Marginal 
2-Good 
3=Very good 

QAavg-o 

Ocean datasets 

QAdpbl-1 

Deep Blue datasets 

M1L2ASAE 

QAb 

AOD and Angstrom 
Exponent (AExp) 

O Successful retrieval, 
single mixture 
1 Successful retrieval, 
multiple mixtures 
2=Data filled by averaging 
neighboring pixels 
3 -No retrieval 

QApprop 


(E Good 
1-Bad 

OMAERUV 

QAfaf 

AOD, Absorption AOD, 
and SSA 

0=Most reliable 
1 —Reliable 
2=Less reliable 

P[1-3]L2TLGC 

P[l-31L2TOGC 

-v-- \ 

Real number between 0 
(Bad) and l (Excellent) 

: ; 

. . ,h |? s? f 1 K-SSfl l ’ 

32-bit flag field 

05kmALay 

flagDay 

Indicates daylight retrievals 

0-No, 1-Yes 

flagStratFeature 

Indicates detected 
stratospheric features 

0-No, l=Yes 

flagBaseExtended 

Indicates that the algorithm 
increased the vertical extent 
of the ground layer 

0-No, 1-Yes 

modeNLayers 

Mode of number of layers 
over all sampled pixels 

Number of layers: [0, 8] 

cNLayers 

Number of layers over the 
central pixel 

CADscoreX 

Additional layer-specific 
flags indicating confidence 
in retrievals. X stands for 
layer index, X=[l,8]. 



Real number: [0,1] 

■ 

16-bit flag field 

SWDB 

QAaod 

AOD - land and ocean 

0-No confidence 
l=Marginal 
2=Good 
3 -Very good 

QAaod-1 

AOD - land 

QAaod-o 

AOD - ocean 

QAaexp 

AExp - land and ocean 

QAaexp-1 

AExp - land 



AExp - ocean 
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1 Table 3. Correlation coefficient (R) between mean values of AERONET AOD and Terra 

2 MODIS AOD at 550nm depending on different MODIS data screening scenarios, and also 

3 between cval (value of the central pixel in the sample) values of these two products. Factors 

4 considered for the screening include QA (Quality Assurance) flags of individual data pixels, 

5 mode of QA flags of all valid pixels in the sample, and nval (number of valid pixels in the 

6 sample). MODIS QA value of 3 indicates the data with the best quality. Correlation values for 

7 the medians of AOD are not shown as they are only marginally different from the reported 

8 values for the means of AOD. AERONET data was interpolated to 550nm and screened to 

9 have nval of at least 4. 

10 


R of AOD at 550nm 
MOD04, V.5.1 

nval 

all 

>=20% (>=5) 

fl,5I 

[6,101 

[11,20] 

[21,26] 

mean 

Land 

No QA filtering 

0.84 

0.86 

0.79 

0.86 

0.87 

0.91 

and 

QA mode=3 

0.90 

0.90 

0.90 

0.91 

0.90 

0.91 

Ocean 

QA pixel=3 

0.88 

0.91 

0.85 

0.90 

0.92 

0.93 


No QA filtering 

0.82 

0.86 

0.75 

0.85 

0.87 

0.91 

Land 

QA mode=3 

0.90 

0.90 

0.88 

0.90 

0.90 

0.91 


QA pixel=3 

0.82 

0.86 

0.75 

0.85 

0.87 

0.91 


No QA filtering 

0.89 

0.92 

0.88 

0.92 

0.93 

0.91 

Ocean 

QA models 

0.93 

0.92 

0.92 

0.93 

0.97 

0.86 


QA pixel=3 

0.89 

0.92 

0.88 

0.92 

0.93 

0.91 


No QA filtering 

0.70 

0.69 

0.74 

0.75 

0.60 

0.73 

Deep Blue 

QA mode=3 

0.79 

0.79 

0.80 

0.82 

0.77 

0.76 


QA pixel=3 

0.75 

0.84 

0.70 

0.82 

0.88 

0.89 

cval 

Land 

No QA filtering 

0.83 

0.84 

0.82 

0.84 

0.82 

0.86 

and 

QA mode=3 

0.88 

0.88 

0.87 

0.90 

0,86 

0.87 

Ocean 

QA pixel=3 

0.90 

0.91 

0.86 

0.91 

0.91 

0.92 


No QA filtering 

0.83 

0.84 

0.81 

0.84 

0.82 

0.86 

Land 

QA mode=3 

0.87 

0.88 

0.87 

0.90 

0.86 

0.87 


QA pixel=3 

0.83 1 

0.84 

0.82 

0.84 

0.82 

0.86 


No QA filtering 

0.93 

0.92 

0.92 

0.93 

0.93 

0.97 

Ocean 

QA mode=3 

0.94 

0.94 

0.94 

0.94 

0.97 

N/A 


QApixel=3 

0.93 

0.92 

0.93 

0.92 

0.93 

0.97 


No QA filtering 

0.72 

0.72 

0.78 

0.78 

0.63 

0.71 

Deep Blue 

QA mode=3 

0.80 

0.80 

0.81 

0.83 

0.78 

0.77 


QA pixel=3 

0.85 | 

0.86 

0.82 

0.84 | 

0.89 | 

0.90 
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1 Table 4. Correlation coefficient (R) between means, medians, and cvals (value of the central 

2 pixel in the sample) of AERONET AOD and OMI AOD at 500 nm depending on different 

3 OMI data screening scenarios. Factors considered for the screening include QA (Quality 

4 Assurance) flags of individual data pixels, mode of QA flags of all valid pixels in the sample, 

5 and nval (number of valid pixels in the sample). OMI QA value of 0 indicates the data with 

6 the best quality. AERONET data was interpolated to 500 nm and screened to have nval of at 

7 least 4. OMI data was additionally screened to exclude data points with AOD>=6, which 

8 comprise 130 of approximately 50,000 data points when computing the statistics for “No QA 

9 filtering”, and 5 of approximately 35,000 data points when computing the statistics for “QA 

10 mode=0”. 


R of AOD at 500nm 
OMAERUV, V.3 

nval 


all i>= 

=20% (>=2) 

LL5J 

[M0] 

mean 

No QA filtering 

r 

0.41 

0.43 

0.44 

0.42 

QA mode=0 


0.52 

0.51 

0.56 

0.42 

QA pixel=0 


0.55 

0.55 

0.57 

0.52 

median 

No QA filtering 


0.42 

0.45 

0.42 

0.45 

QA mode^O 


0.56 

0.56 

0.60 

0.49 

QA pixel A) 


0.55 

037 

0.56 

0.53 

cval 

No QA filtering 

■ 

0.43 

0.43 

0.41 

0.41 

QA mode=0 


0.51 

0.52 

0.57 

0.41 

QA pixeW) 


0.54 

0.57 

0.56 

0.46 


12 

13 


22 




Sun photometer data subset time 
interval: 1 hour (30 minutes before 
and after a satellite over pass) 
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2 Figure 1. Overview of the sampling framework used in MAPSS. Sampling of each spatial 

3 spacebome aerosol product involves extracting values of the pixels that fall within an 

4 approximate radius of 27.5 km from the chosen locations. Similarly, ground-based temporal 

5 observations in a particular location are sampled from measurements taken within 30 minutes 

6 before and 30 minutes after a satellite passes over this location. 
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